Contributions to Building Efficient and Robust State-Machine Replication Protocols

نویسنده

  • Vivien Quéma
چکیده

State machine replication (SMR) is a software technique for tolerating failures using commodity hardware. The critical service to be made fault-tolerant is modeled by a state machine. Several, possibly different, copies of the state machine are then deployed on different nodes. Clients of the service access the replicas through a SMR protocol which ensures that, despite concurrency and failures, replicas perform client requests in the same order. Two objectives underly the design and implementation of a SMR protocol: robustness and performance. Robustness conveys the ability to ensure availability (liveness) and onecopy semantics (safety) despite failures and asynchrony. On the other hand, performance measures the time it takes to respond to a request (latency) and the number of requests that can be processed per time unit (throughput). In this thesis, we present two contributions to state machine replication. The first contribution is LCR, a uniform total order broadcast (UTO-broadcast) protocol that is throughput optimal in failure-free periods. LCR can be used to totally order the requests received by a replicated state machine. LCR has been designed for small clusters of homogeneous machines interconnected by a local area network. It relies on a perfect failure detector and tolerates the crash failures of all but one replicas. It is based on a ring topology and only relies on point-to-point inter-process communication. We benchmark an implementation of LCR against two of the most widely used group communication packages and show that LCR provides higher throughput than them, over a large number of setups. The second contribution is Abstract, a new abstraction to simplify the design, proof and implementation of SMR protocols. Abstract focuses on the most robust class of SMR protocols, i.e. those tolerating arbitrary (client and replica) failures. Such protocols are called Byzantine Fault Tolerant (BFT) protocols. We treat a BFT protocol as a composition of instances of our abstraction. Each instance is developed and analyzed independently. To illustrate our approach, we first show how, with our abstraction, the benefits of a BFT protocol like Zyzzyva could have been developed using less than 24% of the actual code of Zyzzyva. We then present Aliph, a new BFT protocol that outperforms previous BFT protocols both in terms of latency (by up to 30%) and throughput (by up to 360%).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Efficient and Robust BFT Protocols

Byzantine Fault Tolerant (BFT) protocols rely on state machine replication to handle arbitrary behaviors. Significant efforts have been recently made to strengthen these protocols in order to minimize the performance degradation in presence of faulty components. In this paper, we focus on the potential damages that could be introduced from the client side of such protocols. In order to deal wit...

متن کامل

An Attack-Resilient Architecture for Large-Scale Intrusion-Tolerant Replication

This paper presents the first architecture for large-scale, wide-area intrusion-tolerant state machine replication that is specifically designed to perform well even when some of the servers are Byzantine. The architecture is hierarchical and runs attack-resilient state machine replication protocols within and among the wide-area sites. Given the constraints of the wide-area environment, we exp...

متن کامل

Byzantine Fault-Tolerance with Commutative Commands

State machine replication is a popular approach to increasing the availability of computer services. While it has been largely studied in the presence of crash-stop failures and malicious failures, all existing state machine replication protocols that provide byzantine faulttolerance implement some variant of atomic broadcast. In this context, this paper makes two contributions. First, it prese...

متن کامل

BChain: Byzantine Replication with High Throughput and Embedded Reconfiguration

In this paper, we describe the design and implementation of BChain, a Byzantine fault-tolerant state machine replication protocol, which performs comparably to other modern protocols in fault-free cases, but in the face of failures can also quickly recover its steady state performance. Building on chain replication, BChain achieves high throughput and low latency under high client load. At the ...

متن کامل

Efficient Synchronous Byzantine Consensus

We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates f Byzantine faults in an asynchronous setting using 3f +1 replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to n =...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010